A closer look on hierarchical spectro-temporal features (HIST)
نویسندگان
چکیده
Speech recognition robust against interfering noise remains a difficult task. We previously presented a set of spectrotemporal speech features which we termed Hierarchical Spectro-Temporal (HIST) features showing improved robustness, especially when combined with RASTA-PLP. They are inspired by the receptive fields found in the mammalian auditory cortex and are organized in two hierarchical levels. A set of filters learned via ICA captures local variations and constitutes the first layer of the hierarchy. In the second layer these local variations are combined to form larger receptive fields learned via Non Negative Sparse Coding. In this paper we introduce a non-linear smoothing along the time axis of the spectrograms at the input to the hierarchy and, additionally, a more thorough performance analysis on an isolated and a continuous digit recognition task. The results show that the combination of HIST and RASTA-PLP features yields improved recognition scores in noise.
منابع مشابه
A hierarchical framework for spectro-temporal feature extraction
In this paper we present a hierarchical framework for the extraction of spectro-temporal acoustic features. The design of the features targets higher robustness in dynamic environments. Motivated by the large gap between human and machine performance in such conditions we take inspirations from the organization of the mammalian auditory cortex in the design of our features. This includes the jo...
متن کاملSupervised vs. unsupervised learning of spectro temporal speech features
To overcome limitations of purely spectral speech features we previously introduced Hierarchical Spectro-Temporal (HIST) features. We could show that a combination of HIST and standard features does reduce recognition errors in clean and in noise. The HIST features consist of two hierarchical layers where the corresponding filter functions are learned in a data driven way. In this paper we inve...
متن کاملInvestigating the Complementarity of Spectral and Spectro-temporal Features
Most common speech features as Mel Ceptstral Coefficients (MFCCs) and RelAtive SpecTrAl Perceptual Linear Predictive RASTA-PLP features use only spectral information. However, from measurements in the mammalian auditory cortex it is known that the mammalian brain jointly uses spectral and temporal information. To model this we previously developed Hierarchical SpectroTemporal (HIST) features [1...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملGeneralization performance of spetro-temporal speech features
Introduction Despite the fact that the dynamic aspects of speech are very important, conventional speech features as Mel Ceptstral Coefficients (Mfccs) [1] and RelAtive SpecTrAl Perceptual Linear Predictive (Rasta-Plp) features [2] capture only stationary spectral information. We could previously show that a combination of conventional speech features with spectro-temporal speech features yield...
متن کامل